Nearsighted camera object detection

ABSTRACT

A system and process of nearsighted (myopia) camera object detection involves detecting the objects through edge detection and outlining or thickening them with a heavy border. Thickening may include making the object bold in the case of text characters. The bold characters are then much more apparent and heavier weighted than the background. Thresholding operations are then applied (usually multiple times) to the grayscale image to remove all but the darkest foreground objects in the background resulting in a nearsighted (myopic) image. Additional processes may be applied to the nearsighted image, such as morphological closing, contour tracing and bounding of the objects or characters. The bound objects or characters can then be averaged to provide repositioning feedback for the camera user. Processed images can then be captured and subjected to OCR to extract relevant information from the image.

FIELD OF THE INVENTION

The present invention is related to optical character recognition and inparticular the production of images to improve the accuracy of opticalcharacter recognition.

BACKGROUND OF THE INVENTION

Consumers have flocked to mobile devices for a range of applications.Popular applications include budgeting and banking applications. To usethese applications, a consumer will, for example, take a photo of apaper document that is a receipt or a check. The mobile device thenperforms some type of optical character recognition on the document,turning the raw image into alphanumeric character data for storage.

Despite some success, consumers are often frustrated by the inaccuracyof the optical character recognition (OCR) process. There are at leastseveral reasons for these inaccuracies. Unlike large, fixed scanners,handheld electronic devices struggle to capture good images for OCRprocessing. For example, handheld mobile (and other electronic) devicesare prone to unsteady and imperfect photographing of the document. Inaddition, lighting and backgrounds can vary introducing artefacts and/oraffecting the amount of contrast in the image. A handheld device canalso suffer from skew introduced by not having the camera's focal planesquare with the document itself.

Other challenges are introduced by the documents themselves. Documentshave differing characteristics, such as varying fonts, and the OCRprocess can fail to interpret various stylistic font differences. Varieddocuments also have varied sizes—leading many banking applications tofocus just on checks having a predictable size.

Current applications focus on a mixture of guiding the consumer to takebetter images and image processing in an attempt to improve accuracy.For example, some banking applications provide the consumer a frame inwhich to position the check to avoid skew and improve the resolution ofthe check image. These applications may also reject a check that isinsufficiently clear. Conventional image processing can includebinarization to remove background artefacts. Despite these improvements,attempts at gathering images of documents for processing and the OCRprocessing itself, especially with handheld electronic devices, stillfail often enough to frustrate consumers. It is therefore desirable toimprove the accuracy and efficiency of image capture and OCR processingof documents, especially documents captured using handheld electronicdevices.

SUMMARY OF THE INVENTION

Implementations of the present invention include a system and method forgenerating a “myopic” image that attenuates or eliminates backgroundinformation and further processing the myopic image to create an OCRconditioned image that improves the likelihood of successful OCRprocessing. Generally, the method may include pre-processing byobtaining a source image of a foreground document containing characters14, detecting edges of the characters, thickening edges of thecharacters and thresholding the source image to produce a myopic image.Generally, the source image is acquired using a camera of a handheldelectronic device. Further comprising the method may be post-processingactivities to produce an OCR conditioned image.

The inventors have also produced an OCR conditioned image havingimproved OCR accuracy over conventional processes using these images inranges of as much as 5% to 100% depending on environmental conditionssuch as light levels, paper and foreground\background color contrast.Post-processing images performed on the myopic image can includeadaptive thresholding, morphological closing, contour tracing andcalculating an average object size. In one aspect, if the average objectsize is not within a predetermined range, position feedback can beprovided to a user alerting the user to reposition the camera. Once animage is obtained having at least an average object size within thepredetermined range, the improved OCR conditioned image can betransmitted or otherwise provided to an OCR processing system.

In one implementation, a method is provided for generating an image forOCR. The method includes obtaining a source image containing characters.Edges of the characters are detected and thickened. And, the sourceimage is thresholded.

Detecting edges of the characters may include estimating a gradient ofcharacters of the source image. Thickening edges of the characters mayinclude determining an absolute gradient magnitude at points within thesource image. For example, the edges may be detected and thickened usinga 3×3 pixel or larger mask. The mask may be smaller than the averagesize of the characters. Varied masks may be employed, such as aconvolution masks.

Estimating the gradient of the characters may be done in a firstdirection and a second direction. For example, an x-direction and ay-direction.

Thickening of the edges of the characters may be performed with a Sobeloperator. The Sobel operator may use at least one convolution mask, suchas a pair of convolution masks. The convolution masks may be smallerthan the characters. The convolution masks may be 3×3 pixels, forexample. Use of the Sobel operator may include sliding the convolutionmask over the source image.

Thickening the edges may include calculating a magnitude of a gradientof the detected edges of the characters. Thickening the edges may alsoinclude estimating the gradient of the detected edges using a mask.

Thresholding may include using an assumption of a foreground andbackground in the source image. For example, thresholding may includedetermining an optimal threshold value. Determining the optimalthreshold value may include minimizing within class variance of theforeground and background. Minimizing within class variance may alsoinclude weighting of the foreground and background.

Thresholding may also include removing grayscale from a background ofthe source image. And, thresholding may include using histogramsegmentation. Thresholding may also include using Otsu globalthresholding with a block size smaller than an average size of thecharacters.

Thresholding may be repeated until a nearsighted image is generated.Also characters may be repaired by morphologically closing them afterthresholding. Morphologically closing may include use of a structuringelement. The structuring element may be a line-shaped structuringelement to fill gaps within the characters.

The method may also include determining a contour of the characters,such as be determining contour points. Determining the contour may alsoinclude determining a contour hierarchy. Determining the country mayalso include using a Suzuki and Abe algorithm. Contours with less thanthree contour points may be dropped. Contours points may be approximatedas polygonal curves. Also, approximating the contour points may includereducing the contour to a simple closed polygon.

The method may also include bonding the contour. Bounding may, forexample, include circumscribing the contour with a rectangle.Circumscribing may include determining a minimal upright boundingrectangle for the contour. A plurality of contours may be used toapproximate rows of characters.

The method may further include determining an average height of the rowsof characters. Also, the method may include determining an average fontheight for the characters based on the average height of rows of thecharacters. Also, the method may include performing OCR using theaverage font height.

In another implementation, obtaining the source image containingcharacters comprises continuously acquiring the source image anddynamically detecting the edges of the characters, thickening the edgesof the characters, and thresholding the source image while the sourceimage is being continuously acquired. Continuously acquiring the sourceimage may be performed, for example, by a handheld electronic device.The handheld electronic device may further include a display. An imagedisplayed by the handheld electronic device may include the image foroptical character recognition.

Implementations of the present invention provide many advantages.Measurement of the distance of the lens from the paper facilitatescapture of a font object size for improved clarity. The improved clarityresults in improved OCR recognition rates as compared to freehandcapture of the image. Implementations also provide an ability tocalculate optimal font size for OCR detection on a live video feed whileaccounting for optimal focus and clarity. Implementations of the presentinvention can measure and record optimal focal length and OCR font sizeranges on raw video feed. These measurements can be used to guide thecamera user through visual cues and indicators to move the camera to thebest location in space. This produces a better OCR compatible image fortext recognition. The focal ratio determines how much light is picked upby the CCD chip in a given amount of time. The number of pixels in theCCD chip will determine the size of a font text character matrix. Morepixels means a bigger font size, regardless of the physical size of thepixels. OCR engines have an expected and optimal size range forcharacter comparison. When fonts are in the optimal range and have clearcrisp well defined edges, OCR detection and accuracy is improved.Implementations of the present invention provide guidance to thatoptimal range.

These and other features and advantages of the present invention willbecome more readily apparent to those skilled in the art uponconsideration of the following detailed description and accompanyingdrawings, which describe both the preferred and alternative embodimentsof the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A and 1B illustrate an exemplary method and overview system forgenerating a “myopic” image that attenuates or eliminates backgroundinformation and further processing the myopic image to create an OCRconditioned image that improves the likelihood of successful OCRprocessing;

FIG. 2 is an illustration showing a handheld electronic device having acamera with a focal region and a particular defined focal length;

FIG. 3 illustrates and exemplary pair of convolution masks comprising3×3 pixel rectangles that can be used by a Sobel edge detector;

FIG. 4 is an image of the source image after edge detection has beenperformed on the characters of the source image;

FIG. 5 shows a sample pixel set from an image before (on the left) andafter (on the right) Otsu thresholding is applied;

FIG. 6 illustrate before and after images of adaptive thresholding;

FIG. 7 illustrates the use of morphological closing process using astructural element to repair gaps in characters;

FIG. 8 shows an exemplary line-shape structuring element formorphological closing;

FIG. 9 shows exemplary before (left) and after (right) images formorphological closing;

FIGS. 10 and 11, respectively, show an exemplary black-and-white imageand its connected component matrix that can be used in a contour tracingprocess, which uses the size of each element or pixel to measure theheight and width of the sequence;

FIG. 12 shows an example of the Suzuki and Abe process building thesequence (in the form of a tree of elements) from an image;

FIG. 13 shows before (left) and after (right) images where the algorithmtraced the contours of an “A” character;

FIGS. 14 and 15 show portions of a bounding process where a bounding rowbox or rectangle can be placed around each character (as shown in FIG.14) and a row of characters (as shown in FIG. 15) and the bounded boxescan be used to determine the average object or character size;

FIG. 16 shows a graphical display on the handheld electronic device;

FIG. 17 shows a schematic of the relative (1 m along the optical axis)positioning of the lens of the camera with respect to the character “A”on the foreground document;

FIG. 18 shows an exemplary structuring element comprising a 20×3 linesegment used to repair a cursive “j” character in a morphologicalclosing process;

FIG. 19 is a schematic block diagram of an entity capable of performingthe processes described herein; and

FIG. 20 is a schematic block diagram of an exemplary handheld electronicdevice mobile station capable of operating in accordance with anembodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The present invention now will be described more fully hereinafter withreference to specific embodiments of the invention. Indeed, theinvention can be embodied in many different forms and should not beconstrued as limited to the embodiments set forth herein; rather, theseembodiments are provided so that this disclosure will satisfy applicablelegal requirements. As used in the specification, and in the appendedclaims, the singular forms “a”, “an”, “the”, include plural referentsunless the context clearly dictates otherwise. The term “comprising” andvariations thereof as used herein is used synonymously with the term“including” and variations thereof and are open, non-limiting terms.“Exemplary” means “an example of” and is not intended to convey anindication of a preferred or ideal embodiment. “Such as” is not used ina restrictive sense, but for explanatory purposes.

As will be appreciated by one skilled in the art, the methods andsystems may take the form of an entirely hardware embodiment, anentirely software embodiment, or an embodiment combining software andhardware aspects. Furthermore, the methods and systems may take the formof a computer program product on a computer-readable storage mediumhaving computer-readable program instructions (e.g., computer software)embodied in the storage medium. More particularly, the present methodsand systems may take the form of web-implemented computer software. Anysuitable computer-readable storage medium may be utilized including harddisks, CD-ROMs, optical storage devices, or magnetic storage devices.

The methods and systems are described with reference to block diagramsand flowchart illustrations of methods, systems, apparatuses andcomputer program products. It will be understood that each block of theblock diagrams and flowchart illustrations, and combinations of blocksin the block diagrams and flowchart illustrations, respectively, can beimplemented by computer program instructions. These computer programinstructions may be loaded onto a handheld electronic device, a generalpurpose computer, special purpose computer, or other programmable dataprocessing apparatus to produce a machine, such that the instructionswhich execute on the computer or other programmable data processingapparatus create a means for implementing the functions specified in theflowchart block or blocks.

These computer program instructions may also be stored in acomputer-readable memory that can direct a computer or otherprogrammable data processing apparatus to function in a particularmanner, such that the instructions stored in the computer-readablememory produce an article of manufacture including computer-readableinstructions for implementing the function specified in the flowchartblock or blocks. The computer program instructions may also be loadedonto a computer or other programmable data processing apparatus to causea series of operational steps to be performed on the computer or otherprogrammable apparatus to produce a computer-implemented process suchthat the instructions that execute on the computer or other programmableapparatus provide steps for implementing the functions specified in theflowchart block or blocks.

Accordingly, blocks of the block diagrams and flowchart illustrationssupport combinations of means for performing the specified functions,combinations of steps for performing the specified functions and programinstruction means for performing the specified functions. It will alsobe understood that each block of the block diagrams and flowchartillustrations, and combinations of blocks in the block diagrams andflowchart illustrations, can be implemented by special purposehardware-based computer systems that perform the specified functions orsteps, or combinations of special purpose hardware and computerinstructions.

Implementations of the present invention include a system and method forgenerating a “myopic” image that attenuates or eliminates backgroundinformation and further processing the myopic image to create an OCRconditioned image that improves the likelihood of successful OCRprocessing. Generally, as shown in FIGS. 1A and 1B, the method mayinclude pre-processing 56 by obtaining 10 a source image 12 of aforeground document 24 containing characters 14, detecting 16 edges ofthe characters 14, thickening 18 edges of the characters 14 andthresholding 20 the source image 12 to produce a myopic image 58.Generally, the source image 12 is acquired using a camera 60 of ahandheld electronic device 22. Further comprising the method illustratedin FIG. 1A are post-processing activities 62 to produce an OCRconditioned image 64. The inventors have produced an OCR conditionedimage 64 having improved OCR accuracy over conventional processes usingthese images in ranges of as much as 5% to 100% depending onenvironmental conditions such as light levels, paper andforeground\background color contrast. Post-processing images performedon the myopic image 58 can include adaptive thresholding 28,morphological closing 30, contour tracing 32 and calculating an averageobject size 46. In one aspect, if the average object size is not withina predetermined range, position feedback 66 can be provided to a useralerting the user to reposition the camera 60. Once an image is obtainedhaving at least an average object size within the predetermined range,the improved OCR conditioned image 64 can be transmitted 52 or otherwiseprovided to an OCR processing system 54.

As shown in FIG. 2, a handheld electronic device 22 has a camera with afocal region (within the box) and a particular defined focal length.Various documents or other objects or images 24 within the focal regionmay be picked up within an image generated by the electronic device 22.For example, the consumer may hold up a foreground document 24 (such asa receipt) and behind it may be various background objects or documents26 —such as a signs within a restaurant generating the receipt. An issuewith the background documents 26 is that they might get captured in theOCR process and/or may interfere with the OCR of characters 14 on theforeground document 24.

Some aspects of the present invention address this issue by providing(in a simplified description not necessarily capturing all possiblepermutations or complexities) a process for “nearsighted” or “myopic”capture of information that helps to exclude background objects. Thenearsighted capture effectively blurs, attenuates and/or eliminatesartefacts or other characters that are further away than the document ofinterest—thus improving the accuracy of the OCR process.

Generally, the process of nearsighted (myopia) camera object detectioninvolves detecting 16 the objects through edge detection and outliningor thickening 18 them with a heavy border. (Thickening may includemaking the object bold in the case of text characters.) The boldcharacters are then much more apparent and heavier weighted than thebackground—which tends to be grayscale or at least blurred being outsidepreferred focal lengths. Thresholding 20 operations are then applied(optionally, multiple times) to the grayscale image to remove all butthe darkest foreground objects in the background resulting in anearsighted (myopia) image.

Other aspects of systems and methods also facilitate improved imagecapture by providing feedback 66 to the consumer on the positioning 50of the foreground document 24 within an acceptable focal length of thehand held electronic device 22. Generally, the system and methodfacilitate positioning continuously processing captured images,determining average character sizes of the indicia on those images andcomparing them to expected font sizes. The handheld electronic device 22then provides feedback 66 that can include visual cues (such as a sliderbar and green or red status colors) on a display to guide the consumerin repositioning the camera relative to the document 24, hapticfeedback, audible feedback, or combinations thereof.

As shown in FIGS. 1A and 1B, the handheld electronic device 22 obtains10 one or more source images 12. The source images may be generated by acamera 60 attached to, part of or integrated into the handheldelectronic device 22. Or, the source images 12 may already be in amemory of the handheld electronic device 22. Or, the source images 12may be received from some other camera or image capture device or fromstorage associated with such a device. (And combinations of theaforementioned sources may provide the source images 12.)

Despite the availability of other options, most implementations of thepresent invention are well suited for mobile electronic devices 22including a camera 60 and generating source images 12 in the present.For example, the handheld electronic device 22 may be a phone with acamera capturing video (and multiple source images per second) of theforeground document 24.

As shown in FIGS. 1A and 1B, 3 and 4, the process includes detecting 16edges of the source image 12. For example, a Sobel edge detectionapplication or process may be employed for a 2-D spatial gradientmeasurement on the image. Sobel operators are discrete differentiationoperators. Generally, the Sobel edge detection application mayapproximate an absolute gradient magnitude at each point in a grayscalesource image 12. The Sobel edge detection algorithm may be configuredwith a relatively small window size—such as a window smaller than theexpected pixel size of the objects or characters to be processed. Forexample, the Sobel edge detector has a pair of convolution masks thatmay be, as shown in FIG. 3, 3×3 pixel rectangles. One of the convolutionmasks estimates the gradient in the x-direction (Gx or columns) and theother estimates the gradient in the y-direction (Gy or rows). The Sobeloperator slides the mask over the source image one pixel at a time—thusit manipulates one square of pixels at a time.

The convolution masks are represented by the following equations and/orpseudo-code:

int GX[3][3]; int GY[3][3]; /* 3x3 GX Sobel mask */ GX[0][0] = −1;GX[0][1] = 0; GX[0][2] = 1; GX[1][0] = −2; GX[1][1] = 0; GX[1][2] = 2;GX[2][0] = −1; GX[2][1] = 0; GX[2][2] = 1; /* 3x3 GY Sobel mask */GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1; GY[1][0] = 0; GY[1][1] = 0;GY[1][2] = 0; GY[2][0] = −1; GY[2][1] = −2; GY[2][2] = −1;

The Sobel operator also calculates the magnitude of the gradient:

|G|=√{square root over (Gx ² +Gy ²)}

Additional pseudo-code illustrates movement of the mask across theimage, gradient approximation and other operations in full context.

sImage      originalImage; // Input Image sImage      edgeImage;---------------------------------------------------*/ for(Y=0;Y<=(originalImage.rows−1); Y++) {    for(X=0; X<=(originalImage.cols−1);X++) {     long sumX = 0;     long sumY = 0;      /*-------X GRADIENTAPPROXIMATION------*/      for(I=−1; I<=1; I++) {        for(J=−1; J<=1;J++) {         sumX = sumX + (int)( (*(originalImage.data + X + I +         (Y + J)*originalImage.cols)) * GX[I+1][J+1]);        }      }     /*-------Y GRADIENT APPROXIMATION-------*/      for(I=−1; I<=1;I++) {        for(J=−1; J<=1; J++) {         sumY = sumY + (int)((*(originalImage.data + X + I +          (Y + J)*originalImage.cols)) *GY[I+1][J+1]);        }      }      /*---GRADIENT MAGNITUDEAPPROXIMATION      (Myler p.218)----*/       SUM = abs(sumX) +abs(sumY); if(SUM>255) SUM=255;    if(SUM<0) SUM=0;   *(edgeImage.data + X + Y*originalImage.cols) = 255 − (unsigned   char)(SUM);  } }

Generally, then, the Sobel operator changes a pixel's value to the valueof the mask output. Then it shifts one pixel to the right, calculatesagain, and continues to the right until it reaches the end of a row. TheSobel operator then starts at the beginning of the next row. As shown inFIG. 4 the Sobel operator hollows out the internal pixels of thecharacters and thickens the edges—generally providing a highlightingeffect. Restated, the edge detection highlights the foreground object ortext characters to make them bold and have a heavy weight in thegrayscale image. Notably, Sobel operators are not the only processesthat can detect and thicken edges—but the inventors have found the Sobeloperator and particular mask size to be well-suited for receipts.

Another implementation of the Sobel operator uses the following kernelfor noise reduction:

$x = \begin{bmatrix}{- 3} & 0 & {+ 3} \\{- 10} & 0 & {+ 10} \\{- 3} & 0 & {+ 3}\end{bmatrix}$ $y = \begin{bmatrix}{- 3} & {- 10} & {- 3} \\0 & 0 & 0 \\{+ 3} & {+ 10} & {+ 3}\end{bmatrix}$

The kernal window is moved over the image with no scale or shift indelta. This kernal, for example, can be employed with the followingvariables submitted to the Sobel operator:

-   -   Sobel(in =inputImage, out=outputImage, GrayScale, x_(order)=1        and y_(order)=0 KernelSize=3, scale=1, delta shift=0,        DrawSolidBorderOnEdge=IntensitySuroundingWindowPixelsMax)        wherein:

Rectangle rects[ ]     //- Rectangle Array Image inputImage     //-Pumped in Video Frame Image outputImage     //- Output Image afterstandard operations Image outputImage2     //- Output Image afteroptional operations.Kernel selection and size can be adjusted for different foregroundobject types, such as checks, receipts, business cards, etc. Theinventors, however, determined the disclosed particular order of stepsand kernel selection to be particularly effective.

As shown in FIGS. 1A and 1B, the method includes thresholding 20 of thesource image 12. For example, an Otsu thresholding may be applied. TheOtsu thresholding makes an automatic binarization level decision basedon histogram shape. Although other binarizing and/or thresholdingroutines may be applied, the Otsu thresholding has an algorithm thatassumes the source image 12 is composed of two basic classes. These twobasic classes, foreground and background, work well with the Sobeloperator for myopic image generation.

FIG. 5 shows a sample pixel set from an image before (on the left) andafter (on the right) Otsu thresholding is applied. During application,Otsu thresholding computes an optimal threshold value that minimizes thewithin class variance for the background and foreground classes.Minimizing the within class variance has the same effect as maximizingthe between class variance. Thus, as shown in FIG. 5, the image on theright fills foreground pixels and nulls background pixels into a binaryimage. The following is a simple example of the calculation in code:

Background Foreground Weight W_(b) Weight W_(f): Mean μ_(b) Mean μ_(f):Variance σ_(b) ² Variance σ_(f) ²: Within Class Variance   Between ClassVariance $\begin{matrix}{\sigma_{W}^{2} = {{W_{b}\sigma_{b}^{2}} + {W_{f}\sigma_{f}^{2}}}} \\{\sigma_{B}^{2} = {\sigma^{2} - \sigma_{W}^{2}}} \\{= {{W_{b}\left( {\mu_{b} - \mu} \right)}^{2} + {W_{f}\left( {\mu_{f} - \mu} \right)}^{2}}} \\{\left( {{{where}\mspace{14mu} \mu} = {{W_{b}\mu_{b}} + {W_{f}\mu_{f}}}} \right)} \\{= {W_{b}\mu_{f}\mspace{11mu} \left( {\mu_{b} - \mu_{f}} \right)^{2}}}\end{matrix}\quad$Pseudocode of the Otsu thresholding is shown below:

// Calculate histogram int ptr = 0; while (ptr < srcData.length) {  inth = 0xFF & srcData[ptr];  histData[h] ++;  ptr ++; } // Total number ofpixels int total = srcData.length; float sum = 0; for (int t=0 ; t<256 ;t++) sum += t * histData[t]; float sumB = 0; int wB = 0; int wF =0;float varMax = 0; threshold = 0; for (int t=0 ; t<256 ; t++) {  wB +=histData[t];  // Weight Background  if (wB == 0) continue;  wF = total −wB; // Weight Foreground  if (wF == 0) break;  sumB += (float) (t *histData[t]);  float mB = sumB / wB;   // Mean Background  float mF =(sum − sumB) / wF; // Mean Foreground  // Calculate Between ClassVariance  float varBetween = (float)wB * (float)wF * (mB − mF) * (mB −mF);  // Check if new maximum found  if (varBetween > varMax) {   varMax= varBetween;   threshold = t;  } }

The range of the histogram is −1 to 255 in grayscale intensity.Variables may be sent to the Otsu operator to set the histogram range:

Otsu_Threshold(in = outputImage, out = outputImage, Histogram_From = −1Histogram_To = 255, BlackForegroundWhiteBackground).

Thresholding may also additionally or alternatively include an adaptivethresholding 28 for strong edge segmentation. Adaptive thresholdingusing a small block size can result in erosion and highlighting of onlythe strongest edges. Adaptive thresholding beneficially can dynamicallyremove noise for the nearsighted camera operation. Adding the second (oradditional) thresholding process segments the images—separating weakedges from strong edges.

For example, the destination pixel (dst) is calculated as the maskwindow is passed over the image:

${{dst}\left( {x,y} \right)} = \left\{ \begin{matrix}0 & {{{if}\mspace{14mu} {{src}\left( {x,y} \right)}} > {T\left( {x,y} \right)}} \\{maxValue} & {otherwise}\end{matrix} \right.$

where T(x, y) is a threshold calculated individually for each pixel.The threshold value T(x, y) is a mean of the blockSize×blockSizeneighborhood of (x, y) minus C.With a small neighborhood, adaptive thresholding functions like adaptiveedge detection —highlighting only the strongest edges.

Generally, the adaptive thresholding 28 divides the image into a numberof equal blocks. It calculates the threshold value inside each of theblocks. Then the mean value of all the blocks is calculated. Mean valuesbelow a threshold result in removal of blocks (left hand side of FIG. 6)while the values above the threshold result in fill (right hand side ofFIG. 6). Symbolically, the variance is defined:

$\sigma^{2} = {\frac{1}{n}{\sum\limits_{1}^{n - 1}\; \left( {{Ti} - \mu} \right)^{2}}}$

wherein Ti is the threshold value of each block, μ is the mean of allblocks, n is the number of blocks.

Thus, as the block window is passed over the image, pixels are filledwith black or removed with a fill of white depending on theconcentrations in the block of primary black or white. The adaptivethresholding then can be a form of thinning operation leaving only thestrongest edges which generally should be foreground objects—such ascharacters 14 on the foreground object 24.

In one implementation, adaptive thresholding (or erosion) 28 is by wayof a 7×7 pixel kernel. The thresholding uses the mean of the kernelpixels to determine black or white for the kernel window moving over theimage after global segmentation by the Otsu operation. Thus, squares of7×7 pixels are forced into black or white, such as is shown in thefollowing variable selection for an adaptive threshold application:

BlockSize = 7 int Thresh_Kernel[BlockSize][ BlockSize]AdaptiveThresholdErosion(in = outputImage, out = outputImage2,Histogram_From = −1 Histogram_To = 255, Kernel = Thresh_Kernel,BlackBackgroundWhiteForeground_Inverse).Generally, then, this thresholding operation completes washing out ofthe background to generate a nearsighted or myopic image.

Another thresholding operation may make a second, third or otherwiseadditional (or only) pass over the image. This operation may be optionalbased on the mean light level in the histogram. Additional thresholdingcan be skipped if the image is light already based on the mean lightlevel in the histogram. This is demonstrated by pseudocode below:

BOOL TreatWithSecondPassErosionImage

The mean and standard deviation of the grayscale image are determined:

var Mean var Stddev get_meanStdDev(in = inputImage, out = Mean, out =Stddev)The low extreme of the mean is set to determine whether to employadditional thresholding:

if( cvMean.val[0] < 120 && cvStddev.val[0] > 40 ) // Dark { TreatWithSecondPassErosionImage = TRUE } else if( cvMean.val[0] >= 120&& cvMean.val[0] < 200 && cvStddev.val[0] < 40 ) // Medium { TreatWithSecondPassErosionImage = TRUE } else if( cvMean.val[0] >= 200&& cvStddev.val[0] < 40 ) // Light {   TreatWithSecondPassErosionImage =FALSE } else // Anything else {  TreatWithSecondPassErosionImage = TRUE} // Use one or the other of the imagesif(TreatWithSecondPassErosionImage == TRUE) {   outputImage =outputImage2 }

In any case, the resulting myopic image is then ready for the next phaseof OCR processes and/or can be used to facilitate adjustment of therelative positioning of the object and mobile electronic device 22.Generally, computer vision algorithms are applied to the resulting imagefor improved accuracy in object size detection. The method may forexample include morphological closing 30, contour tracing 32 andbounding 34 of the objects or characters 14, as shown in FIGS. 1A and1B.

The morphological closing 30 process uses a structural element to repairgaps in characters, as shown in FIG. 7. The nearsighted operation washesout the background and can damage foreground objects. The morphologicalclosing 30 process repairs and closes the lines in the foregroundobjects based on a line-shaped structuring element. The line-shapedstructuring element fills the font or text character objects—repairingthe damage. FIG. 8 shows the line-shape structuring element and FIG. 9shows the before (left) and after (right) images for morphologicalclosing.

An exemplary structuring element is a 20×3 line segment and used torepair a cursive “j” character, as shown in FIG. 18.

The contour tracing 32 process gathers objects and sizes. These objectsand sizes are used to determine the average text object size on theforeground document 24. The contour tracing 32 process includesdetection of edges that yield contours of the underlying object.Generally, the objects with contours will be closed objects. The matrixof a particular image includes trees or lists of elements that aresequences. Every entry into the sequence encodes information about thelocation of the next point of the object or character.

FIGS. 10 and 11, respectively, show an exemplary black-and-white imageand its connected component matrix. The contour tracing 32 process usesthe size of each element or pixel to measure the height and width of thesequence. As a result, the contour tracing 32 process has determined howmany characters or objects are in the nearsighted image and the size ofeach object or character.

An exemplary process for contour tracing 32 includes using the Suzukiand Abe algorithm. Generally, the algorithm determines topographicalinformation about contours of objects using hierarchical borderfollowing. FIG. 12 shows an example of the Suzuki and Abe processbuilding the sequence (in the form of a tree of elements) from an image.FIG. 13 shows before (left) and after (right) images where the algorithmtraced the contours of an “A” character. As an additional step, contourtracing 32 includes elimination of contours with less than three contourpoints or not enough points to form a character or desired object.

Contour tracing 32 also can include a shape approximation process.Assuming that most contour points form polygonal curves with multiplevertices, the shape can be approximated with a less complex polygon. Theshape approximation process may include, for example, theRamer-Douglas-Peucker (RDP) algorithm. The RDP algorithm finds similarcurves with fewer points with a dissimilarity less than or equal to aspecific approximation accuracy. The shape approximation processfacilitates bounding 34 by reducing the contours of the characters tosimple polygon closed shapes.

In one implementation, the following variables are submitted to theSuzuki and Abe application:

Objects objects [ ] //- array of objects Objects objects2[ ] //- arrayof objects meeting filtered size and component FindObjects( in =outputImage, out = objects, FindOutsideOnlyContour)Notably, this submission is only concerned with the outside shape of theobjects to allow them to be bound within another shape, such as a boxwhich represents the minimum and maximum x and y pixel coordinates ofthe object.

The bounding 34 process places a peripheral boundary around eachcharacter and around each row of characters 14. For example, a boundingrow box or rectangle 34 can be placed around each character (as shown inFIG. 14) and a row of characters 14 (as shown in FIG. 15). The processuses the bounding row rectangle 34 to determine the average object orcharacter size.

The bounding 34 process calculates and returns the minimal up-rightbounding rectangle 34 for the specified point in an approximated contourfor an object or character. The contour of the object is used toapproximate a row of text objects. The height of the rows are thenaveraged to get an average character font height for the document. Inexemplary pseudocode, the process submits variables for averaging theheight and returning an average object size height:

long heightSum = 0 double fontScale = 0 for(int i=0; i < rects.size( );i++) {    heightSum += rects[i].height; } if(rects.size( ) > 1 ) {   fontScale = heightSum / rects.size( ) }.

Optionally, the bounding 34 process may include a filter that excludesobjects of certain size parameters. For example, polygon objects withfewer than 2 or 3 components may be excluded. A more complex filter ofobjects outside a 2 to 19 font size is shown by the followingpseudocode:

for(int i = 0; i < objects2.size( ); i++ ) {  // When we move the camerafar away,  // the bounding rectangle can become 2 lines combined    //filter these out  if ( (objects2[i].Rect.width / 1.5 ) >objects2[i].Rect.height)  {        // Keep objects that are 2 pixels to19 pixels in size  if(objects2[i].Rect.height > 1 && objects2[i].Rect.height < 20 )    {     rects.add(objects2[i].Rect);  }  } }wherein the filter blocks arrays of rectangles around objects wherein awidth of the array is not at least 50% larger than the height. Also, thefilter may exclude objects (characters) that have a size less than 2pixels and greater than 19 pixels. Although other filter parameters arepossible, the inventors have found that these parameters work well forimages of financial documents such as receipts.

In another aspect of the present invention, as shown in FIGS. 1A and 1B,the source images 12 may be obtained 10 continuously, processed intonearsighted images (16, 18 and 20) further processed (30, 32 and 34) todetermine average font height and used in a feedback loop 36 tofacilitate repositioning the handheld electronic device 22. Generally,then, the process may use real-time feedback 66 on the size of theobject in the source images 12 to determine and provide feedback orotherwise facilitate improved relative positioning of the handheldelectronic device 22 and the foreground document 24 to improve OCRaccuracy.

FIG. 16 shows a graphical display 40 on the handheld electronic device22. The graphical display 40 includes an image of a foreground document24 that is currently being processed by a processor of the handheldelectronic device 22 to be nearsighted in real-time. The graphicaldisplay 40 also includes a capture button 42 and a slider bar 44. Thecapture button 42 activates capture, storage and/or transmission of theimage and/or the results of an OCR process on the image, preferably whenthe application communicates appropriate positioning of the device.Alternative or in addition, the application may have an automatedfeature where the image is automatically captured for further storage orprocessing when within the appropriate range of positions.

The slider bar 44 shows a range of relative positioning of the—withinthe center bar—that the slider may fall and still be within thepreferred focal length of the camera. At a frame rate of 20 or 30 framesper second, the slider would readjust based on the current relativepositioning. Moving too far out or in would cause the slider to movedown or up outside the center bar and/or the center bar to flash a redcolor. When within the preferred range, the slider bar and center barmay turn green to signal that the image is ready for capturing andfurther processing. FIG. 17 shows a schematic of the relative (1 m alongthe optical axis) positioning of the lens of the camera with respect tothe character “A” on the foreground document 24. The inventors havefound that remarkably, the feedback system disclosed herein can improvepositioning to within 1 inch (plus or minus) of the focal length of thelens.

The process of measuring the size of objects such as text fonts inreal-time using a mobile electronic device (such as a video camera on asmart phone, tablet or some other moveable electronic or computingdevice with access to processing power) allows for a wide range ofapplications. Captured images have improved sizing and resolution forlater comparisons in applications such as OCR or virtual reality markerdetection. The advantages of this process are not limited to OCR. Anycomparison based computer vision application will benefit when a knownsize object is presented before processing. The approach being presentedhere operates in real-time at 20-30 fps on a mobile device allowing foruser feedback to get the optimal focal length and object size duringimage capture. This process is set apart from any other attempts by anaccuracy of 1 inch or 25.4 mm while detecting nearsighted objects on adocument or foreground.

Referring now to FIG. 19, an exemplary block diagram of an entitycapable of operating as a handheld electronic device 22 is shown inaccordance with one embodiment of the present invention. The entitycapable of operating as a handheld electronic device 22 includes variousmeans for performing one or more functions in accordance withembodiments of the present invention, including those more particularlyshown and described herein. It should be understood, however, that oneor more of the entities may include alternative means for performing oneor more like functions, without departing from the spirit and scope ofthe present invention. As shown, the entity capable of operating as ahandheld electronic device 22 can generally include means, such as aprocessor 210 for performing or controlling the various functions of theentity. In particular, the processor 210 may be configured to performthe processes discussed in more detail with regard to FIGS. 1A and 1B.

In one embodiment, the processor is in communication with or includesmemory 220, such as volatile and/or non-volatile memory that storescontent, data or the like. For example, the memory 220 may store contenttransmitted from, and/or received by, the entity. Also for example, thememory 220 may store software applications, instructions or the like forthe processor to perform steps associated with operation of the entityin accordance with embodiments of the present invention. In particular,the memory 220 may store software applications, instructions or the likefor the processor to perform the operations described above with regardto FIGS. 1A and 1B.

In addition to the memory 220, the processor 210 can also be connectedto at least one interface or other means for displaying, transmittingand/or receiving data, content or the like. In this regard, theinterface(s) can include at least one communication interface 230 orother means for transmitting and/or receiving data, content or the like,as well as at least one user interface that can include a display 240and/or a user input interface 250. The user input interface, in turn,can comprise any of a number of devices allowing the entity to receivedata such as a keypad, a touch display, a joystick, a camera or otherinput device.

Reference is now made to FIG. 20, which illustrates one type ofelectronic device that would benefit from embodiments of the presentinvention. As shown, the electronic device may be a handheld electronicdevice 22, and, in particular, a cellular telephone. It should beunderstood, however, that the device illustrated and hereinafterdescribed is merely illustrative of one type of electronic device thatwould benefit from the present invention and, therefore, should not betaken to limit the scope of the present invention. While severalembodiments of the handheld electronic device 22 are illustrated andwill be hereinafter described for purposes of example, other types ofmobile stations, such as personal digital assistants (PDAs), pagers,laptop computers, as well as other types of electronic systems includingboth mobile, wireless devices and fixed, wireline devices, can readilyemploy embodiments of the present invention.

The handheld electronic device 22 includes various means for performingone or more functions in accordance with embodiments of the presentinvention, including those more particularly shown and described herein.It should be understood, however, that the mobile station may includealternative means for performing one or more like functions, withoutdeparting from the spirit and scope of the present invention. Moreparticularly, for example, as shown in FIG. 20, in addition to anantenna 302, the handheld electronic device 22 includes a transmitter304, a receiver 306, and an apparatus that includes means, such as aprocessor 308, controller or the like, that provides signals to andreceives signals from the transmitter 304 and receiver 306,respectively, and that performs the various other functions describedbelow including, for example, the functions relating to the processesdescribed in relation to FIGS. 1A and 1B.

As one of ordinary skill in the art would recognize, the signalsprovided to and received from the transmitter 304 and receiver 306,respectively, may include signaling information in accordance with theair interface standard of the applicable cellular system and also userspeech and/or user generated data. In this regard, the mobile stationcan be capable of operating with one or more air interface standards,communication protocols, modulation types, and access types. Moreparticularly, the mobile station can be capable of operating inaccordance with any of a number of second-generation (2G), 2.5G, 3G, 4G,4G LTE communication protocols or the like. Further, for example, themobile station can be capable of operating in accordance with any of anumber of different wireless networking techniques, including Bluetooth,IEEE 802.11 WLAN (or Wi-Fi®), IEEE 802.16 WiMAX, ultra wideband (UWB),and the like

It is understood that the processor 308, controller or other computingdevice, may include the circuitry required for implementing the video,audio, and logic functions of the mobile station and may be capable ofexecuting application programs for implementing the functionalitydiscussed herein. For example, the processor may be comprised of variousmeans including a digital signal processor device, a microprocessordevice, and various analog to digital converters, digital to analogconverters, and other support circuits. The control and signalprocessing functions of the mobile device are allocated between thesedevices according to their respective capabilities. The processor 308thus also includes the functionality to convolutionally encode andinterleave message and data prior to modulation and transmission.Further, the processor 308 may include the functionality to operate oneor more software applications, which may be stored in memory. Forexample, the controller may be capable of operating a connectivityprogram, such as a conventional Web browser. The connectivity programmay then allow the mobile station to transmit and receive Web content,such as according to HTTP and/or the Wireless Application Protocol(WAP), for example.

The mobile station may also comprise means such as a user interfaceincluding, for example, a conventional earphone or speaker 310, a ringer312, a microphone 314, a display 316, all of which are coupled to theprocessor 308. The user input interface, which allows the mobile deviceto receive data, can comprise any of a number of devices allowing themobile device to receive data, such as a keypad 318, a touch display(not shown), a microphone 314, or other input device. In embodimentsincluding a keypad, the keypad can include the conventional numeric(0-9) and related keys (#, *), and other keys used for operating themobile station and may include a full set of alphanumeric keys or set ofkeys that may be activated to provide a full set of alphanumeric keys.Although not shown, the mobile station may include a battery, such as avibrating battery pack, for powering the various circuits that arerequired to operate the mobile station, as well as optionally providingmechanical vibration as a detectable output.

The mobile station can also include means, such as memory including, forexample, a subscriber identity module (SIM) 320, a removable useridentity module (R-UIM) (not shown), or the like, which may storeinformation elements related to a mobile subscriber. In addition to theSIM, the mobile device can include other memory. In this regard, themobile station can include volatile memory 322, as well as othernon-volatile memory 324, which can be embedded and/or may be removable.For example, the other non-volatile memory may be embedded or removablemultimedia memory cards (MMCs), secure digital (SD) memory cards, MemorySticks, EEPROM, flash memory, hard disk, or the like. The memory canstore any of a number of pieces or amount of information and data usedby the mobile device to implement the functions of the mobile station.For example, the memory can store an identifier, such as aninternational mobile equipment identification (IMEI) code, internationalmobile subscriber identification (IMSI) code, mobile device integratedservices digital network (MSISDN) code, or the like, capable of uniquelyidentifying the mobile device. The memory can also store content. Thememory may, for example, store computer program code for an applicationand other computer programs. For example, in one embodiment of thepresent invention, the memory may store computer program code forperforming the processes associated with FIGS. 1A and 1B, as describedherein.

While the methods and systems have been described in connection withpreferred embodiments and specific examples, it is not intended that thescope be limited to the particular embodiments set forth, as theembodiments herein are intended in all respects to be illustrativerather than restrictive.

Unless otherwise expressly stated, it is in no way intended that anymethod set forth herein be construed as requiring that its steps beperformed in a specific order. Accordingly, where a method claim doesnot actually recite an order to be followed by its steps or it is nototherwise specifically stated in the claims or descriptions that thesteps are to be limited to a specific order, it is no way intended thatan order be inferred, in any respect. This holds for any possiblenon-express basis for interpretation, including: matters of logic withrespect to arrangement of steps or operational flow; plain meaningderived from grammatical organization or punctuation; the number or typeof embodiments described in the specification.

Implementations of the present invention provide many advantages.Measurement of the distance of the lens from the paper facilitatescapture of a font object size for improved clarity. The improved clarityresults in improved OCR recognition rates as compared to freehandcapture of the image. Implementations also provide an ability tocalculate optimal font size for OCR detection on a live video feed whileaccounting for optimal focus and clarity. Implementations of the presentinvention can measure and record optimal focal length and OCR font sizeranges on raw video feed. These measurements can be used to guide thecamera user through visual cues and indicators to move the camera to thebest location in space. This produces a better OCR compatible image fortext recognition. The focal ratio determines how much light is picked upby the CCD chip in a given amount of time. The number of pixels in theCCD chip will determine the size of a font text character matrix. Morepixels means a bigger font size, regardless of the physical size of thepixels. OCR engines have an expected and optimal size range forcharacter comparison. When fonts are in the optimal range and have clearcrisp well defined edges, OCR detection and accuracy is improved.Implementations of the present invention provide guidance to thatoptimal range.

It will be apparent to those skilled in the art that variousmodifications and variations can be made without departing from thescope or spirit. Other embodiments will be apparent to those skilled inthe art from consideration of the specification and practice disclosedherein. It is intended that the specification and examples be consideredas exemplary only, with a true scope and spirit being indicated by thefollowing claims.

1-47. (canceled)
 48. A method of generating, during acquisition via acamera of an image of a document, a plurality of pre-processed images ofthe document, the pre-processed images being used to optimize a captureposition of the camera when capturing the image of the document foroptical character recognition, the method comprising: obtaining aplurality of source images, including a first source image and a secondsource image, continuously acquired via the camera of a computingdevice, each of the obtained plurality of source images containingcharacters associated with the document, wherein the first source imageis acquired by the camera at a first capture position, and wherein thesecond source image is acquired by the camera at a second captureposition, wherein the first capture position is different from thesecond capture position; for each of the plurality of obtained sourceimages, pre-processing a given obtained source image to generate, by aprocessor of the computing device, a pre-processed image of the givenobtained source image; and presenting, on a graphical user interface,via a display of the computing device, i) the pre-processed image andii) a graphical indicator to guide physical repositioning of the camerato capture an image to be used by an optical character recognitionoperation to determine characters of the image of the document, whereinthe graphical widget presents one or more parameters associated with thecamera being in an determined appropriate range position.
 49. The methodof claim 48, wherein the pre-processing emphasizes characters associatedwith a foreground portion of the image of the document attenuates imageobjects that are background to the characters associated with theforeground portion.
 50. The method of claim 48, wherein thepre-processing comprises: detecting edges of the characters in the givenobtained source image using an image processing operation, thickeningedges of the characters of the detected edge characters to generate afirst intermediate image data using a second image processing operation,and thresholding the first intermediate image data.
 51. The method ofclaim 48, wherein the appropriate range position is determined when afocal plane associated with the camera is square with the image of thedocument.
 52. The method of claim 48, wherein the appropriate rangeposition is determined at an optimum focal length of the camera to thedocument.
 53. The method of claim 48, wherein the appropriate rangeposition is determined at an acceptable focal length of the camera tothe document.
 54. The method of claim 50, wherein the operation ofdetecting and thickening edges of the characters include using a Sobeloperator.
 55. The method of claim 48, further comprising: automaticallycapturing and storing the pre-processed image when the document iswithin a range of positions, the range being determined based on thecharacters associated with the document in the pre-processed image. 56.The method of claim 48, further comprising determining an average fontheight for the characters; and performing optical character recognitionusing the determined average font height.
 57. The method of claim 48,further comprising performing optical character recognition.
 58. Themethod of claim 48, wherein the computing device comprises a handheldelectronic device.
 59. The method of claim 48, wherein the plurality ofsource images are captured as a video feed by the camera of thecomputing device.
 60. A system of generating, during acquisition via acamera of an image of a document, a plurality of pre-processed images ofthe document, the pre-processed images being used to optimize a captureposition of the camera when capturing the image of the document foroptical character recognition, the system comprising: a camera; aprocessor; and a memory operatively coupled to the processor, the memoryhaving instructions stored thereon, wherein execution of theinstructions by the processor, cause the processor to: obtain aplurality of source images, including a first source image and a secondsource image, continuously acquired via the camera, each of the obtainedplurality of source images containing characters associated with thedocument, wherein the first source image is acquired by the camera at afirst capture position, and wherein the second source image is acquiredby the camera at a second capture position, wherein the first captureposition is different from the second capture position; for each of theplurality of obtained source images, pre-process a given obtained sourceimage to generate a pre-processed image of the given obtained sourceimage; and present, on a graphical user interface, via a display, i) thepre-processed image and ii) a graphical indicator to guide physicalrepositioning of the camera to capture an image to be used by an opticalcharacter recognition operation to determine characters of the image ofthe document, wherein the graphical widget presents one or moreparameters associated with the camera being in an determined appropriaterange position.
 61. The system of claim 60, wherein the pre-processingemphasizes characters associated with a foreground portion of the imageof the document attenuates image objects that are background to thecharacters associated with the foreground portion by detecting edges ofthe characters in the given obtained source image using an imageprocessing operation, thickening edges of the characters of the detectededge characters to generate a first intermediate image data using asecond image processing operation, and thresholding the firstintermediate image data.
 62. The system of claim 60, wherein theappropriate range position is determined at an optimum focal length ofthe camera to the document.
 63. The system of claim 60, wherein theappropriate range position is determined at an acceptable focal lengthof the camera to the document.
 64. The system of claim 60, wherein theinstructions, when executed by the processor, cause the processor to:automatically capture and store the pre-processed image when thedocument is within a range of positions, the range being determinedbased on the characters associated with the document in thepre-processed image.
 65. The system of claim 60, wherein the systemdevice comprises a handheld electronic device.
 66. The system of claim60, wherein the camera operates to capture images at 20-30 frames persecond.
 67. A non-transitory computer readable medium for capturing theimage of the document for optical character recognition, the computerreadable medium having instructions stored thereon, wherein execution ofthe instructions by a processor of a computing device, cause theprocessor to: obtain a plurality of source images, including a firstsource image and a second source image, continuously acquired via acamera, each of the obtained plurality of source images containingcharacters associated with the document, wherein the first source imageis acquired by the camera at a first capture position, and wherein thesecond source image is acquired by the camera at a second captureposition, wherein the first capture position is different from thesecond capture position; for each of the plurality of obtained sourceimages, pre-process a given obtained source image to generate apre-processed image of the given obtained source image; and present, ona graphical user interface, via a display of the computing device, i)the pre-processed image and ii) a graphical indicator to guide physicalrepositioning of the camera to capture an image to be used by an opticalcharacter recognition operation to determine characters of the image ofthe document, wherein the graphical widget presents one or moreparameters associated with the camera being in an determined appropriaterange position.